Robustness and Sample Complexity of Model-Based MARL for General-Sum Markov Games

نویسندگان

چکیده

Multi-agent reinforcement learning (MARL) is often modeled using the framework of Markov games (also called stochastic or dynamic games). Most existing literature on MARL concentrates zero-sum but not applicable to general-sum games. It known that best response dynamics in are a contraction. Therefore, different equilibria can have values. Moreover, Q-function sufficient completely characterize equilibrium. Given these challenges, model-based an attractive approach for In this paper, we investigate fundamental question sample complexity algorithms We show two results. first use Hoeffding inequality-based bounds $$\tilde{{\mathcal {O}}}( (1-\gamma )^{-4} \alpha ^{-2})$$ samples per state–action pair obtain $$\alpha $$ -approximate perfect equilibrium with high probability, where $$\gamma discount factor, and {O}}}(\cdot )$$ notation hides logarithmic terms. then Bernstein )^{-1} ^{-2} sufficient. To results, study robustness model approximations. approximate (or perturbed) game always original provide explicit approximation error. illustrate results via numerical example.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Taking turns in general sum Markov games

This paper provides a novel approach to multi-agent coordination in general sum Markov games. Contrary to what is common in multi-agent learning, our approach does not focus on reaching a particular equilibrium between agent policies. Instead, it learns a basis set of special joint agent policies, over which it can randomize to build different solutions. The main idea is to tackle a Markov game...

متن کامل

General-Sum Games: Correlated Equilibria

This lecture introduces a generalization of Nash equilibrium due to Aumann [1] known as correlated equilibrium, which allows for possible dependencies in strategic choices. A daily example of a correlated equilibrium is a traffic light: a red (green) signal suggests that cars should stop (go), and following each suggestion is of course rational. Following Aumann [2], we present two definitions ...

متن کامل

Sampling Techniques for Markov Games Approximation Results on Sampling Techniques for Zero-sum, Discounted Markov Games

We extend the “policy rollout” sampling technique for Markov decision processes to Markov games, and provide an approximation result guaranteeing that the resulting sampling-based policy is closer to the Nash equilibrium than the underlying base policy. This improvement is achieved with an amount of sampling that is independent of the state-space size. We base our approximation result on a more...

متن کامل

Timed Parity Games: Complexity and Robustness

We consider two-player games played in real time on game structures with clocks and parity objectives. The games are concurrent in that at each turn, both players independently propose a time delay and an action, and the action with the shorter delay is chosen. To prevent a player from winning by blocking time, we restrict each player to strategies that ensure that the player cannot be responsi...

متن کامل

Exploitation and Safety in General Sum Games

We describe a method for an agent playing a generalsum normal form game to balance the rewards of exploiting a prediction of opponent behavior with the risks of being exploited by a self-interested opponent while guaranteeing a worst-case safety margin. Our algorithm, Restricted Stackelberg Response with Safety, calculates a probability distribution over the agent’s moves that balances those co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Dynamic Games and Applications

سال: 2023

ISSN: ['2153-0793', '2153-0785']

DOI: https://doi.org/10.1007/s13235-023-00490-2